[SPARK-46294][SQL] Clean up semantics of init vs zero value by davintjong-db · Pull Request #44222 · apache/spark

davintjong-db · 2023-12-07T00:34:51Z

What changes were proposed in this pull request?

Cleaning up the semantics of init and zero value to the following. This also helps define what an "invalid" metric is.

initValue is the starting value for a SQLMetric. If a metric has value equal to its initValue, then it can/should be filtered out before aggregating with SQLMetrics.stringValue().

zeroValue defines the lowest value considered valid. If a SQLMetric is invalid, it is set to zeroValue upon receiving any updates, and it also reports zeroValue as its value to avoid exposing it to the user programatically (concern previouosly addressed in SPARK-41442).

For many SQLMetrics, we use initValue = -1 and zeroValue = 0 to indicate that the metric is by default invalid. Whenever an invalid metric is updated, it sets itself to zeroValue and becomes valid. Invalid metrics will be filtered out when calculating min, max, etc. as a workaround for SPARK-11013.

Why are the changes needed?

The semantics of initValue and _zeroValue in SQLMetrics is a little bit confusing, since they effectively mean the same thing. Changing it to the following would be clearer, especially in terms of defining what an "invalid" metric is.

Does this PR introduce any user-facing change?

No. This shouldn't change any behavior.

How was this patch tested?

Existing tests.

Was this patch authored or co-authored using generative AI tooling?

No.

cloud-fan · 2023-12-14T04:54:40Z

sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala

-  // values before calculate max, min, etc.
-  private[this] var _value = initValue
-  private var _zeroValue = initValue
+class SQLMetric(val metricType: String,


Suggested change

class SQLMetric(val metricType: String,

class SQLMetric(

val metricType: String,

4 spaces indentation for multi-line parameters

cloud-fan · 2023-12-14T05:01:22Z

sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala

+  // This is used to filter out metrics. Metrics with value equal to initValue should
+  // be filtered out, since they are either invalid or safe to filter without changing
+  // the aggregation defined in [[SQLMetrics.stringValue]].
+  override def isZero: Boolean = _value == initValue


this is a bit tricky as isZero is not true when we actually have the zero value...

Let's enrich the comment to highlight that, we may want to collect the 0 value for calculating min/max/avg. We can still link to SPARK-11013.

cloud-fan · 2023-12-14T23:38:29Z

thanks, merging to master!

Clean up semantics of init vs zero value

83c676d

github-actions bot added the SQL label Dec 7, 2023

davintjong-db added 3 commits December 6, 2023 16:41

Style

e6ec5f1

scalastyle

07b26a8

Fix, clarify comments

f2945d7

davintjong-db changed the title ~~[WIP][SPARK-46294][SQL] Clean up semantics of init vs zero value~~ [SPARK-46294][SQL] Clean up semantics of init vs zero value Dec 7, 2023

cloud-fan reviewed Dec 14, 2023

View reviewed changes

cloud-fan approved these changes Dec 14, 2023

View reviewed changes

elaborate comment

6f98fcd

davintjong-db requested a review from cloud-fan December 14, 2023 18:36

cloud-fan closed this in 50e668c Dec 14, 2023

jiwen624 mentioned this pull request Feb 26, 2025

[SPARK-51299][SQL][UI] MetricUtils.stringValue should filter metric values with initValue rather than a hardcoded value #50055

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-46294][SQL] Clean up semantics of init vs zero value#44222

[SPARK-46294][SQL] Clean up semantics of init vs zero value#44222
davintjong-db wants to merge 5 commits intoapache:masterfrom
davintjong-db:sqlmetric-initvalue-refactor

davintjong-db commented Dec 7, 2023 •

edited

Loading

Uh oh!

cloud-fan Dec 14, 2023

Uh oh!

cloud-fan Dec 14, 2023

Uh oh!

cloud-fan Dec 14, 2023

Uh oh!

cloud-fan Dec 14, 2023

Uh oh!

cloud-fan commented Dec 14, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	class SQLMetric(val metricType: String,
	class SQLMetric(
	val metricType: String,

Conversation

davintjong-db commented Dec 7, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

cloud-fan Dec 14, 2023

Choose a reason for hiding this comment

Uh oh!

cloud-fan Dec 14, 2023

Choose a reason for hiding this comment

Uh oh!

cloud-fan Dec 14, 2023

Choose a reason for hiding this comment

Uh oh!

cloud-fan Dec 14, 2023

Choose a reason for hiding this comment

Uh oh!

cloud-fan commented Dec 14, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

davintjong-db commented Dec 7, 2023 •

edited

Loading